Skip to content

feat(auto-routing): Morph model router decisions for kilo-auto tiers#4005

Draft
shreybirmiwalmorph wants to merge 1 commit into
Kilo-Org:mainfrom
shreybirmiwalmorph:feat/morph-router-auto-routing
Draft

feat(auto-routing): Morph model router decisions for kilo-auto tiers#4005
shreybirmiwalmorph wants to merge 1 commit into
Kilo-Org:mainfrom
shreybirmiwalmorph:feat/morph-router-auto-routing

Conversation

@shreybirmiwalmorph

Copy link
Copy Markdown

Summary

Adds per-prompt model routing for kilo-auto/* tiers via the Morph model router, running in shadow mode inside the auto-routing worker. The worker's response contract has carried decision: null since it shipped — this PR fills that slot with a real decision engine while leaving serving behavior untouched.

How it fits the existing architecture:

  • Gateway (apps/web): the existing background mirror now includes a routing context for kilo-auto requests — the tier, the candidate models that tier may route among (new getMorphRouterCandidates, a product-owned set in auto-model/index.ts), and the model the static resolver actually picked. Optional field, so gateway/worker deploys never coordinate.
  • Worker (services/auto-routing): new morph-router.ts calls POST /v1/router/multimodel with the tier's candidates (mapped Kilo public id ↔ Morph catalog id; unmapped candidates dropped), a tier-derived policy (frontier → capability_heavy, balanced → balanced), and the static pick as default_model so ambiguous prompts stay behavior-identical. Runs in parallel with the existing classifier; either side failing never affects the other.
  • Caching: decisions reuse the per-conversation Durable Object cache, keyed by content hash + candidate-set/policy fingerprint, so multi-turn conversations stick to one model and config changes invalidate cleanly.
  • Controls & telemetry: everything is behind a morph_router_enabled KV flag (default off, fail-closed). Decisions emit a sampled auto_routing_router_decision log line carrying the static pick vs. routed model, difficulty/confidence/domain, and latency — the dataset for deciding whether to serve these decisions later.

Morph only ever receives one bounded user-prompt prefix (≤1000 chars) — never the conversation, system prompt, tools, or any identifiers. Org/enterprise traffic is unaffected (the mirror already skips it).

Videos

Live end-to-end demowrangler dev + real Morph API: an easy prompt on the frontier tier routes to Sonnet instead of the static Opus pick, a heavyweight planning prompt confirms Opus (confidence 99%), the balanced tier picks across providers, and a repeat turn in the same conversation serves from the Durable Object cache in ~22 ms with no router call:

Morph router demo

Full quality: demo.mp4

Verification

  • Ran the worker locally (wrangler dev + local secrets-store values + morph_router_enabled=true in local KV) and sent realistic mirror payloads against the live Morph API: easy/hard/balanced prompts produced differentiated decisions, reverse-mapped to Kilo public ids (see video).
  • Verified decision cache: a second request in the same conversation returned the identical decision in ~22 ms without a Morph call.
  • Verified independence: with an invalid OpenRouter key the classifier errors while the router decision is still returned (and vice versa — a Morph 503 leaves classification untouched).
  • Verified default-off: with the KV flag unset, no Morph request is made and decision stays null.

Visual Changes

N/A

Reviewer Notes

  • Shadow mode only — nothing consumes decision yet. The intended follow-up is a flagged synchronous path in resolveAutoModel once the shadow logs show good agreement/cost numbers; happy to discuss rollout shape.
  • New third-party data flow: bounded prompt prefixes go to Morph when the flag is on. Flag is off by default precisely so this can clear privacy/DPA review first. (Disclosure: I'm on the Morph team — shrey@morphllm.com.)
  • The Kilo↔Morph model id map in morph-router.ts is deliberately small and explicit; candidates without a mapping are dropped before the call, and a tier with <2 routable candidates skips routing entirely (skipped:insufficient_candidates in logs).
  • Per .specs/model-experiments.md, experimented public ids must not enter kilo-auto candidate sets — the new candidate map documents this constraint where it would be violated.
  • MORPH_API_KEY needs to be added to the secrets store before enabling the flag; binding is already declared in wrangler.jsonc, and .dev.vars.example covers local dev.
  • Full pnpm typecheck was not run (per AGENTS.md guidance); scripts/typecheck-all.sh --changes-only plus a targeted pnpm --filter web typecheck both pass.

🤖 Generated with Claude Code

The auto-routing worker can now produce real per-prompt routing decisions
via the Morph model router, filling the decision field the contract has
carried as null. The gateway sends tier routing context (candidates plus
the static resolver's pick) with each mirror; the worker consults Morph in
parallel with the existing classifier, caches decisions per conversation,
and returns them in shadow mode behind the morph_router_enabled KV flag.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant